An Exploration of Formalized Information Retrieval Heuristics

نویسندگان

  • Hui Fang
  • Tao Tao
  • ChengXiang Zhai
چکیده

Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit implementation of these heuristics. One basic research question is thus what are exactly these “necessary” heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of these retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is only satisfied for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well they satisfy these constraints. Thus the proposed constraints can provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Exploration of Formalized Retrieval Heuristics

Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit implementation of these heuristics. One basic research question is thus what are exactly these “...

متن کامل

Applying Heuristics to Improve A Genetic Query Optimisation Process in Information Retrieval

This work presents a genetic approach for query optimisation in information retrieval. The proposed GA is improved y heuristics in order to solve the relevance multimodality problem and adapt the genetic exploration process to the information retrieval task. Experiments with AP documents and queries issued from TREC show the effectiveness of our GA model

متن کامل

Multiple query evaluation based on an enhanced genetic algorithm

Recent studies suggest that significant improvement in information retrieval performance can be achieved by combining multiple representations of an information need. The paper presents a genetic approach that combines the results from multiple query evaluations. The genetic algorithm aims to optimise the overall relevance estimate by exploring different directions of the document space. We inv...

متن کامل

Exploration of Proximity Heuristics in Length Normalization

Ranking functions used in information retrieval are primarily used in the search engines and they are often adopted for various language processing applications. However, features used in the construction of ranking functions should be analyzed before applying it on a data set. This paper gives guidelines on construction of generalized ranking functions with applicationdependent features. The p...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008